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Colonization of the land by plants required major modifications in cellular structural 
composition and metabolism. Intercellular communication through plasmodesmata (PD) 
plays a critical role in the coordination of growth and cell activities. Changes in the 
form, regulation or function of these channels are likely linked to plant adaptation to 
the terrestrial environments. Constriction of PD aperture by deposition of callose is the 
best-studied mechanism in PD regulation. Glycosyl hydrolases family 17 (GHL17) are 
callose degrading enzymes. In Arabidopsis this is a large protein family, few of which 
have been PD-localized. The objective here is to identify correlations between evolution 
of this protein family and their role at PD and to use this information as a tool to predict 
the localization of candidates isolated in a proteomic screen. With this aim, we studied 
phylogenetic relationship between Arabidopsis GHL17 sequences and those isolated from 
fungi, green algae, mosses and monocot representatives. Three distinct phylogenetic 
clades were identified. Clade alpha contained only embryophytes sequences suggesting 
that this subgroup appeared during land colonization in organisms with functional PD. 
Accordingly, all PD-associated GHL17 proteins identified so far in Arabidopsis tlialiana and 
Populus are grouped in this 'embryophytes only' phylogenetic clade. Next, we tested the 
use of this knowledge to discriminate between candidates isolated in the PD proteome. 
Transient and stable expression of GFP protein fusions confirmed PD localization for 
candidates contained in clade alpha but not for candidates contained in clade beta. Our 
results suggest that GHL17 membrane proteins contained in the alpha clade evolved and 
expanded during land colonization to play new roles, among others, in PD regulation. 
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INTRODUCTION 

Cell-to-cell communication is a requisite for the evolution of 
multicellular organisms. Plant intercellular connections (plas- 
modesmata, PD) are thought to originate with the appearance 
of multiceUularity in green algae but their structural complexity 
increased, presumably, as a result of changes in cell-wall compo- 
sition during adaptation to terrestrial environments (Lucas and 
Lee, 2004; Popper et al., 2011). Similarities between intercellular 
connections in charophytic algae and in early land plants suggest 
that they have a common evolutionary origin. Plasmodesmata 
occur in all embryophytes (including mosses) and, in their sim- 
plest form, also appear in representatives of charophytic green 
algae (Franceschi et al, 1994; Cook et al, 1997; Raven, 1997; 
Graham et al., 2000; Qiu, 2008). The presence of phragmoplast 
(p, enlarged cytoplasmic connection formed in the later stages 
of plant cell mitosis) in the zygnematalean taxa suggest that PD 
likely originate during the evolution of phragmoplast-containing 
charophyceans (Figure 1). 

In their primary form, PD arise during cytokinesis, presum- 
ably via enclosure of endoplasmic reticulum by cell wall depo- 
sitions (Hepler, 1981; Cook et al., 1997). Important features of 
plant PDs (such as neck constriction and central desmo tubule 
like structure) appear in Cham species but since the colonization 
of land by plants (more than 400 million years ago) numerous 



modifications in PD ultrastructure and regulation are expected. A 
more complete understanding of the evolutionary steps involved 
in the origin of plant PDs, their function and regulation should 
be possible through the identification of plasmodesma-associated 
proteins and analysis of their evolutionary appearance in charo- 
phycean algae and land plants. Plasmodesma-associated proteins 
have been isolated in model plants, such as Arabidopsis and 
tobacco, using genetic and proteomic screens but the compo- 
sition of the channel in model and non-model organisms is 
far from being resolved (Faulkner and Maule, 2011). Genome 
sequencing projects and prediction tools for protein structure 
and targeting has been proven useful to establish protein localiza- 
tion and function in different intracellular compartments (e.g., 
Pires and Dolan, 2010; Ma et al, 2011; Tardif et al, 2012). 
Known PD proteins display characteristic features of membrane- 
localized proteins (such as secretory signal peptides, glycosyl 
phosphatidylinositol anchors or transmembrane domains) but 
no specific sequence signature for PD -binding has been yet 
discovered. 

Recently we have obtained information on the identity of 
Arabidopsis PD proteins, including several callose (beta 1,3 glu- 
cans) metabolic enzymes (Levy et al., 2007; Fernandez-Calvino 
et al, 2011; Vaten et al., 2011; Benitez-Alfonso et al, 2013). 
Callose deposition at PD neck region correlates with a reduction 
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FIGURE 1 I Phylogenetic relationships of the species used in this study. 

The cladogram is based on the current view of land plant evolution (Qiu, 
2008). Members of the order Mesostigmatales, Klebsormidiales, 
Zygnematales, Coleochatales, and Charales form the charophytic green 
algae lineage (land plant ancestors). Representatives from these orders 
selected for this study are named in the figure. Embryophytes (such as the 
moss Physcomitrella patents and the vascular plant Arabldopsis thaliana) 
evolved from charophytic algae during land colonization. Phragmoplast (p) 
were found in organisms belonging to the Coleochatales and the Charales. 
Plasmodesmata (PD) appeared in all embryophytes. 



in symplastic transport during tissue maturation (Burch-Smith 
and Zambryski, 2012; Slewinski et al., 2012). Callose also acts 
as a reversible regulator of intercellular transport in response 
to developmental and environmental signals (Levy et al., 2007; 
Benitez-Alfonso et al., 2010; Maule et al, 2011, 2013; Rinne et al, 
2011; Zavaliev et al., 2011). This implies that the activity of cal- 
lose biosynthetic (callose synthases, CalS) and degrading enzymes 
(glycosyl hydrolase family 17, GHL17) must be rapidly and effi- 
ciently regulated at PD sites. Not surprisingly, PD-associated CalS 
and GHL17 proteins have been recently identified (Gusenian 
et al, 2010; Vaten et al, 2011; Slewinski et al, 2012; Benitez- 
Alfonso et al, 2013; Zavaliev et al, 2013). 

The role of plasmodesmata-localized GHL17 proteins in plant 
development and response to viral pathogens has been well 
established (Levy et al., 2007; Zavaliev et al., 2011; Burch-Smith 
and Zambryski, 2012). The identification of these enzymes in 
crop species could lead to the development of biotechnological 
approaches to improve plant growth and response to environ- 
mental and developmental signals. This task is hindered by the 
lack of tools to discriminate between plasma membrane (PM) 
and PD GHL17 proteins. Generation of fluorescent fusions and 
transgenics to determine intracellular localization will be required 
but, without any preliminary method to screen for candidates, 
this process could become very expensive and time consuming 
especially when dealing with large multigenic families such as 
GHL17. 



Callose metabolic enzymes are conserved in fungi, oomycetes, 
algae and plants which indicate that this is a very ancient 
metabolic pathway (Bachman and McClay, 1996; Popper et al., 
2011). What is not known is when this pathway was recruited 
to play an active role in PD regulation. The answer to this ques- 
tion might underlie in the evolutionary diversification of these 
enzymes to play PD-specific functions in land plants. 

In this paper we present evidences supporting a potential cor- 
relation between the evolutionary origin of GHL17 proteins and 
their likelihood to target PD sites. Through phylogenetic anal- 
ysis we identified a clade of membrane proteins that appear to 
have diverged early during land plants adaptation to terrestrial 
environments. The intracellular localization of predicted mem- 
brane GHL17 proteins isolated from Arabidopsis and Populus 
suggest that this "embryophytes only" subgroup is enriched in 
PD proteins (Pechanova et al, 2010; Fernandez-Calvino et al., 
2011; Rinne et al, 2011; Benitez-Alfonso et al, 2013; Zavaliev 
et al., 2013). We used this information for the preliminary screen 
of 4 candidates identified through the proteomic screen of PD- 
enriched cell wall fractions. Two of the proteins belonged to clade 
alpha and were previously described to localize at PD. We tested 
the localization of two proteins that belonged to clade beta and 
found, through fluorescent imaging of m-Citrine protein fusions, 
that they accumulate preferentially in the apoplast. Our results 
suggest that at least a portion of GHL17 membrane proteins con- 
tained in clade alpha evolved in embryophytes differently from 
proteins contained in clade beta to specifically target PD and 
control callose on site. 

MATERIALS AND METHODS 

RETRIEVAL OF GHL17 SEQUENCES AND ANALYSIS OF PROTEIN 
DOMAINS 

To isolate sequences containing the 1,3-beta glucosidase domain 
(GH17) from charophycean algae, Physcomitrella patens and 
selected embryophytes (Arabidopsis thaliana, Populus trichocarpa 
and Oryza sativa) BLAST (Altschul et al., 1990) searches were 
performed using as query five representative GHL17 sequences 
from Arabidopsis thaliana (At3gl3560, At3g57260, At4gl4080, 
At4g31140, At5g42100). For charophycean algae we searched 
the National Centre for Biotechnology Information (http:// 
www.ncbi.nlm.nih.gov/) non-redundant (NR), high-throughput 
genome sequence (HTGS), whole genome shotgun (WGS), 
genome survey sequence (GSS) and expressed sequence tag 
(EST) databases. We obtained partial ESTs that were trans- 
lated to amino acid sequences using Expasy translate tool. 
Presence of GH17 domain was confirmed in these sequences 
using the Conserved Domain (Marchler-Bauer et al., 2007) and 
SMART (http://smart.embl-heidelberg.de; Letunic et al, 2012) 
search engines. To isolate GH17 proteins from embryophytes 
sequenced genomes (Physcomitrella patens, Populus trichocarpa 
and Oryza sativa) a BLAST search against the Refseq protein 
database for each specific organism was performed using as 
query the same five Arabidopsis representative listed above and 
the GHL17 consensus domain sequence (cll8348). Similarly, to 
isolate beta-l,3-glucanases from fungi representatives (Candida 
albicans, Aspergillus clavatus, Aspergillus fumigatus, Aspergillus 
niger, Candida glabrata, Debaryomyces hansenii, Ashbya gossypii. 
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Fusarium graminearum, Kluyveromyces lactis, Saccharomyces 
cerevisiae, Scheffersomyces stipitis, Schizosaccharomyces potnbe, 
Yarrowia Upolytica) the consensus domain sequence (cil8819) 
was used to search the reference genome databases. Only protein 
sequences containing GH17 domain (confirmed in SMART) and 
predicted to be complete were considered. Aramemnon (http:// 
aramemnon.uni-koeln.de/request.ep) was also used to search 
and/or confirm the identity of the proteins isolated in the Rice 
annotation project database or in Phytozome. 

To eliminate redundancies, and/or to identify overlapping 
regions in isolated ESTs, sequences obtained for each organ- 
ism were aligned using Muscle (Edgar, 2004). The resulting 
sequences are summarized in Table 1 . These were screened for 
characteristic features of this family, the presence of a secre- 
tory signal peptide (SP), glycosyl phosphatidylinositol anchor 
(GPI) and carbohydrate-binding module (X8), using the predic- 
tion programs SMART, SignalP 4.1 Serve, Phobius, GPI-SOM, 
FragAnchor, PredGPI and BIG-PI respectively (Eisenhaber et al., 
2003; Fankhauser and Maser, 2005; Poisson et al, 2007; Pierleoni 
et al., 2008; Petersen et al., 2011; Letunic et al, 2012). According 
to the results obtained full length sequences were classified in the 
following types: type 0 showed no obvious SP (non-secreted pro- 
teins); type 1 contains SP and might (or might not) contain one or 
more X8 domains (predicted secreted proteins); type 2 contains 
SP, one or more X8 domains and GPI anchor and type 3 con- 
tains SP and GPI anchor but not X8 domain. The presence of GPI 
anchor in type 2 and 3 proteins was used to predict their mem- 
brane localization. The classification of the sequences analyzed is 
provided in Table 2. 

ALIGNMENTS, SEQUENCE CONSERVATION, AND PHYLOGENETIC 
ANALYSIS 

AH sequences isolated from representatives of charophycean 
algae and fungi, P. patens, Oryza sativa and Arabidopsis thaliana 
(Table 1) were aligned using Muscle (Edgar, 2004). Sequences 
from algae were incomplete which generate large gaps. These 
gaps were mostly avoided when only the domain was used. 
Therefore we constructed trees with both, full sequences and 
domain only. These alignments are provided in Supplementary 
data 1. To calculate the best fitting model of amino acid evo- 
lution MEGA5 was used (Tamura et al, 2013). This suggests 
WAG-l-G-l-F as the best model under the Akaike Information 
Criterion. Dendograms were obtained using three different meth- 
ods of tree reconstruction [maximum likelihood (ML), neighbor- 
joining (NJ) and Bayesian inference (Bayesian)]. A majority-rule 
consensus tree was built by Bayesian inference using Mr. Bayes 
(Huelsenbeck and Ronquist, 2001). Convergence was reached 
after 960000 generations (3720000 when using domain only) and 
posterior probabilities were calculated for each clade. Using the 
same model a ML analysis was performed with MEGA5 (Tamura 
et al., 2013) and bootstrap values were determined from a popu- 
lation of 100 replicates. A NJ tree was also generated using Phylip 
(Felsenstein, 1997) as well as bootstrap values, which were deter- 
mined from a population of 100 replicates. The tree was visu- 
alized using Figtree (http://tree.bio.ed.ac.uk/software/figtree/). 
A similar protocol was followed for phylogenetic compari- 
son of Arabidopsis thaliana and Populus trichocarpa sequences 



Table 1 | List of sequences used for constructing tiie pfiylogenetic 
trees. 



Organism 


Identifier in this 


Sequence 




paper 


Identifier 


Klebsormidium 


KfGHL17_1 


H0446722 + 


flaccidum 




H044d665 


Klebsormidium 


KtbHL17_2 


1 1 AT~ A C\'\r\ -\ 

HO451810.1 


1 laULflLIUI 1 1 






rcniiuiii n lai ijai 1 LaLfCUi 1 1 


PmRHI 17 1 

riii'ji ii_i/ 1 


in?9n9Ri 1 

1JWZ.Z.UZ.-J 1. 1 


lac iLiJfJI lacfl lUIUI 1 1 


PnGHI 17 1 
■^y \j 1 1 1_ 1 / 1 


HD-dDDRIR 1 

1 \\J'-V\J\J'J 1 U. 1 


globosum 






Nitelld mirdbilis 


NtGHL17_1 


JV792233.1 


Nitella mirdbilis 


NtGHL17_2 


JV742253.1 


Nitella mirabilis 


NtGHL17_3 


JV760383.1 


Pliyscomitrella patens 


PpGHL17_1 


XP_001 761806.1 


Pliyscomitrella patens 


PpGHL17_2 


XP_001 772420.1 


Ptiyscomitrella patens 


PpGHL17_3 


XP_001 780679.1 


Physcomitreila patens 


PpGHL17_4 


XP_001 762206.1 


Physcomitreila patens 


PpGHL17_5 


XP_001780506.1 


Ptiyscomitrella patens 


PpGHL17_6 


XP_001 779924.1 


Physcomitreila patens 


PpGHL17_7 


XP_001767901.1 


Physcomitreila patens 


PpGHL17_8 


XP_001 77 1454.1 


Physcomitreila patens 


PpGHL17_9 


XP_001782572.1 


Physcomitreila patens 


PpGHL17_10 


XP_001773368.1 


Physcomitreila patens 


PpGHL17_11 


XP_001782548.1 


Physcomitreila patens 


PpGHL17 12 


XP 001772976.1 


Physcomitreila patens 


PpGHL17 13 


XP 001757439.1 


Physcomitreila patens 


PpGHL17_14 


XP_0017546171 


Physcomitreila patens 


PpGHL17 15 


XP 001775842.1 


Physcomitreila patens 


PpGHL17 16 


XP 001762304.1 


I 1 iyc>\j\^l 1 11 it CnCj kJO Id I J 


PnGHI 17 17 

[ 111—1/ I / 


XP 001757144 


Physcomitreila patens 


rpu riL 1 /_ 1 o 


Ar_UU \ 1 1 1 £jQ \. \ 




CaGHI 17 1 

V../C] 1 1 1_ 1 / 1 






ArGHI 17 1 

r^i\j vJ III 1 / 1 


XP 001269137 1 

/\] \J\J \ Z-\J-U 1 «_> ^ . 1 




AfnHI 17 1 

1 y 1 1 1_ 1 / 1 


yP 7R9R11 1 
/\r / .jz. vJ 1 1 . 1 




AnGHI 17 1 

AAI 1 vJ III 1 / 1 


XP 001397475 1 


Candida giabrata 




VP) A Ar"^~tA -1 

Xr_44oJ/4.1 


Debaryomyces 


UnoHL 1 /_ 1 


VD /ICOOCC 1 

Ar_4bZooO. 1 


liansenii 






Aslibya gossypii 


A r^r^ Li! 1 "7 1 

AgbHLI 1 _\ 


^l^ oor'oi a o 

Nr_yoDJz4.z 


Fusarium graminearum 


FgGHL17_1 


XP_383705.1 


Kiuyveromyces iactis 


KIGHL17_1 


XP_4552171 


Saccharomyces 


ScGHL17_1 


NP_011 798.1 


cerevisiae 






Scheffersomyces 


C.^^ Li! 1 "7 "1 


Xr_uul3o/bbb.1 


stipitis 






Schizosaccharomyces 


Cr,^r;|_ll 17 -1 
opuon L 1 /_ 1 


MP RQA/IRR 1 
IN r_oy'4-'400. 1 


pombe 






Yarrowia lipoiytica 


YIGHL17_1 


XP_500465.1 


Oryza sativa 


OsGHL17_1 


NP_001 052739.1 


Oryza sativa 


OsGHL17_2 


NP_001 044874.1 


Oryza sativa 


OsGHL17_3 


NP_001 0470271 


Oryza sativa 


OsGHL17_4 


NP_001 046220.1 


Oryza sativa 


OsGHL17_5 


NP_001 058028.1 


Oryza sativa 


OsGHL17_6 


NP_001 0441 98.1 



(Continued) 



www.frontiersin.org 



May 2014 | Volume 5 | Article 212 | 3 



Gaudioso-Pedraza and Benitez-Alfonso 



Origin and evolution of PD-located GHL17 
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Organism 


Identifier in this 


Sequence 




paper 


identifier 


Oryza sativa 


(JSbHLl /_/ 


hin r\r\i r\ir 1 1 -1 1 i 

Nr_UUlUoini.l 


Oryza sativa 


l~\r^/~'LJ\ "1 "7 O 

(JSoHLl /_o 


Kin r\r\i r\ /I n /1 1 o i 

Nr_UUl 049413. 1 


Oryza sativa 


/-I _ /-> l_l 1 -1-7 rt 

UsbHLl /_y 


Nr_UUl UbUUo/.Z 


Oryza sativa 


r~\.^^LJi "i"7 "in 
(JSbHLl /_1U 


Kin r\r\i r\irm en 1 

Nr_UUlUoy /oz.l 


Oryza sativa 


/~\r^/~'LJ\ "1"7 "11 

(JSbHLl /_11 


KID r\r\ir\(^oi /in o 

Nr_UUl Ubol4U.z 


Oryza sativa 


r^r^r^ui "i"7 "in 
(JsbHLl /_lz 


KID nmrnnrmo 1 

Nr_UUlUo/yDo.l 


Oryza sativa 


USbHL 1 /_ 1 o 


D A PlO 1 77n 1 

dauo 1 / /y. 1 


Oryza sativa 


USbHL 1 /_ 14 


N 1 D nni 1 70 /i CI 1 
Nr_UUI 1 /o4b 1. 1 


Oryza sativa 


(^r^^l— II 1"7 1C 

USbHL 1 /_ 1 0 


N 1 D nni ncno 1 n i 
IMr_UUIUoUo ID. 1 


Oryza sativa 


l^i^^l— II 1"7 1C 

USbHL 1 /_ 1 b 


N 1 D nni ncci CO i 
IMr_UUIUob loo. 1 


Oryza sativa 


II 1"7 1"7 

USbHL 1 /_ 1 / 


MD nninc:o7on i 
IMr_UUIUbz/oy. 1 


Oryza sativa 


II 1"7 10 

USbHL 1 /_ 1 o 


D A Pini C70 1 

bAUUIb/o. 1 


Oryza sativa 


USbn L 1 /_ 1 y 


M D nni nKi 077 1 

IN r_UU 1 UD 1 Z / /. 1 


Oryza sativa 


USbn L 1 /_zu 


M D nni n/i c^q/i /i 1 

IM r_UU 1 U40O44. 1 


Oryza sativa 


o^r^i-i 1 17 o 1 

USbHL 1 /_Z 1 


A AnQ7Q77 
AAUo/y / / 


Oryza sativa 


USbHL 1 /_zz 


A A D/1 /I CKQ 

AAr44boy 


Oryza sativa 


OoPUII 17 OQ 

USbHL 1 /_zo 


ADry4/Ob. 1 


Oryza sativa 


(~\t-r'i-\ 117 iA 
USbHL 1 /_z4 


ADryo444. 1 


Arabidopsis thaliana 


ATzguo/yu 


MD 170C070 

rslr_ 1 /obo/.z 


Arabidopsis thaliana 


AT4gzDooU 


IM r_ 1 y44 1 0 . z 


Arabidopsis tfialiana 


ATogoo 1 oU 


MD nniiK/i7Qn 1 
lNr_UUl lo4/oU. 1 


Arabidopsis tfialiana 


A t/1 ^ 1 oo A n 

AT4g 1 oo4U 


M D 1 no ceo 0 
N r_ 1 yoobo.Z 


Arabidopsis thaliana 


Ati ,-iQnnQn 
AT 1 goUUoU 


M D 17/1 onn 0 
Nr_ 1 /4oUU.z 


Arabidopsis thaliana 


ATzgZDbUU 


MD QcnnQO 1 
Nr_ooUUoZ. 1 


Arabidopsis thaliana 


ATog 1 DoUU 


M D 1 QQoni 1 
IN r_ 1 ooZU 1. 1 


Arabidopsis thaliana 


ATzgz / DUU 


MD nni no 1 /i 00 
IN r_uu 1 Uo 1 4oZ 


Arabidopsis thaliana 


At^mA 1 1 nn 

ATDg4Z lUU 


MD Q7/1Q(=;Q 1 


Arabidopsis thaliana 


Ati nTPQRn 
AX 1 goZcSDU 


M D 17/1 CRO 0 
INr_ I /40bo.Z 


Arabidopsis thaliana 


ATogz4J lo 


MD nniiiQ07i 1 
IMr_UUI 1 lyz/ 1. 1 


Arabidopsis thaliana 


ATog4Db/U 


MD 1 QnO/l 1 1 

lNr_ 1 yuz4 1. 1 


Arabidopsis thaliana 


ATzgoyo4U 


M D 1 Q1 /I Q/1 1 

lNr_ 1 0 I4y4. 1 


Arabidopsis thaliana 


AtQ/^CF;/1Qn 
ATogu04oU 


M D 1 Q1 1 no 1 
lNr_ 1 y 1 lUo. 1 


Arabidopsis thaliana 


ATog4z/zU 


M D 1 QQnOC 0 

In r_ 1 yyuob.z 


Arabidopsis thaliana 


At/l f-iQ>1 /IQn 
AT4go44c5U 


MD 1QC;i7/1 R 
INr_ lyO 1 /4.D 


Arabidopsis thaliana 


ATZg IDZJU 


MD 17Q01Q A 

IN r_ 1 / yz 1 y.4 


Arabidopsis thaliana 


ATog loboU 


MD Q7/10n0 1 

lNr_y/4oUo. 1 


Arabidopsis thaliana 


Ati ^11 oon 
AT 1 g 1 lozU 


K 1 D nni 1 0 /I nc7 1 
lNr_UUI lo4yb/. 1 


Arabidopsis thaliana 


Ati /^ccoKn 
AT 1 gbbzbu 


MD 17C7QQ 0 

iNr_ 1 /b/yy.z 


Arabidopsis thaliana 


ATzgU luoU 


MD nnin77QRC 1 
lNr_UUIU/ /oOb. 1 


Arabidopsis thaliana 


At/1 /^^QQcn 
AT4gzyobU 


MD CC;7QOQ 0 

IM r_Db/oZo.o 


Arabidopsis thaliana 


ATogoobyu 


MD onn/i7n 1 
IMr_zUU4/U. 1 


Arabidopsis thaliana 


ATogoo /oU 


M D 1 ni 1 07 1 
In r_ 1 y 1 1 0 /. 1 


Arabidopsis thaliana 


ATogO 1 C5 lU 


MD 1Q17/in 1 
INr_ 1 y 1 /4U. 1 


Arabidopsis thaliana 


ATogU / OZU 


MD RQOROQ 1 
INr_booOoo. 1 


Arabidopsis thaliana 


AtQ,-iT377n 
ATogzo/ /U 


M D 1 DQni Q 1 

iMr_ 1 tsyu 1 y. 1 


Arabidopsis thaliana 


At4g 14080 


NP_1 93 144.1 


Arabidopsis thaliana 


At5g58480 


NP_200656.2 


Arabidopsis thaliana 


At4g17180 


NP_193451.2 


Arabidopsis thaliana 


At5g64790 


NP_201 284.1 


Arabidopsis thaliana 


At3g04010 


NP_187051.3 


Arabidopsis thaliana 


At5g18220 


NP_197323.1 


Arabidopsis thaliana 


At1g64760 


NP_001 03 1232.1 



(Continued) 
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Organism 


Identifier in this 


Sequence 




paper 


identifier 


Arabidopsis thaliana 


A+0,-i1 Q/1 A n 

ATzg 1 y44U 


NP. 


.179534.1 


Arabidopsis thaliana 


A+o,-iO/i oon 
ATogz4ooU 


NP. 


.189076.1 


Arabidopsis thaliana 


A+c-iOnQ7n 
ATogzUo/U 


NP. 


.1975871 


Arabidopsis thaliana 


ATogoouyu 


NP. 


.2006172 


Arabidopsis thaliana 


A+A 1 1 /in 
AT4go 1 I4U 


NP. 


.194843.1 


Arabidopsis thaliana 


A + 1 ,-i777Qn 

AT 1 g/ / /yu 


NP. 


.177902.1 


Arabidopsis thaliana 


A + 1 ,-i777Qn 

At 1 g/ / /oU 


NP. 


.177901.1 


Arabidopsis thaliana 


A+RriOnoQn 
Aiogzuoyu 


NP. 


.197539.1 


Arabidopsis thaliana 


Aiogzuoou 


NP. 


.197556.1 


Arabidopsis thaliana 


A+1 riOooon 
Ai 1 goozzu 


NP. 


.174592.1 


Arabidopsis thaliana 


A+c-iono/1 n 
AtogzUo4U 


NP. 


.197534.1 


Arabidopsis thaliana 


A+c-ionoon 
AtogzUooU 


NP. 


.197533.1 


Arabidopsis thaliana 


At4g 1 bzoU 


NP. 


.193361.4 


Arabidopsis thaliana 


A+0,-iC707n 

Atogo/z/u 


NP. 


.191286.1 


Arabidopsis thaliana 


A+0,^C70/1 n 

Atogo/z4U 


NP. 


.191283.2 


Arabidopsis thaliana 


A+0,^C70Rn 

Atogo/zbU 


NP. 


.191285.1 


Populus trichocarpa 


Dt^ |_| 1 1 "7 1 

rTbHL 1 /_ 1 


XP_ 


.002297638.2 


Populus trichocarpa 


Dtr^ |_| 1 1 "7 o 

rTb H L 1 /_z 


XP_ 


.002304004.2 


Populus trichocarpa 


D+r^ 1— 1 1 1 7 o 

rTb H L 1 /_o 


XP_ 


.002314794.2 


Populus trichocarpa 


Dt^ 1— 1 1 1 7 A 

rTb H L 1 /_4 


XP_ 


.002305879.1 


Populus trichocarpa 


Dt^ U 1 1 7 C 

rTb H L 1 /_o 


XP_ 


.006389594.1 


Populus trichocarpa 


Dt^ 1— 1 1 1 7 C 

rTb H L 1 /_b 


XP_ 


.006371969.1 


Populus trichocarpa 


Dt^ U 1 1 7 7 

rTbHL 1 /_/ 


XP_ 


.002316783.2 


Populus trichocarpa 


Dt^ 1— 1 1 1 7 O 

rTb H L 1 /_o 


XP_ 


.002333242.1 


Populus trichocarpa 


Dt^ 1-1 1 1 "7 Q 

rTb H L 1 / _y 


XP_ 


.002302861.2 


Populus trichocarpa 


Dt^uii 17 in 
rTbHL 1 /_ ID 


XP_ 


.002318439.2 


Populus trichocarpa 


Dt^ |_| i 1 "7 11 

rTbHL 1 /_ 1 1 


XP_ 


.006384505.1 


Populus trichocarpa 


Dt^ 1-1 1 1 "7 1 o 
rTbHL 1 /_ Iz 


XP_ 


.006379239.1 


Populus trichocarpa 


Dt^ |_| 1 1 "7 1 Q 

rTbHL 1 /_ lo 


XP_ 


.0023120971 


Populus trichocarpa 


Dtr* U 1 1 7 1/1 

rTbHL 1 /_ 14 


XP_ 


.002312098.1 


Populus trichocarpa 


Dt^ 1-1 i 1 "7 1 c 

rTbHL 1 /_ 1 o 


XP_ 


.002303070.2 


Populus trichocarpa 


Dtr* U 1 1 7 1 R 

rtbHL 1 /_ lb 


XP_ 


.002298356.1 


Populus trichocarpa 


Dtr'UII 17 17 

rTb H L 1 /_ 1 / 


XP_ 


.002332000.1 


Populus trichocarpa 


Dt^ |_| i 1 "7 1 Q 
rTbHL 1 /_ lo 


XP_ 


.002317055.2 


Populus trichocarpa 


Dt^ 1-1 1 1 "7 1 Q 
rTbHL 1 /_ ly 


XP_ 


.002306003.2 


Populus trichocarpa 


Dt^uii 17 on 
rTbHL 1 /_zU 


XP_ 


.006385314.1 


Populus trichocarpa 


Dt^ 1— 1 1 1 7 O 1 

rTb H L 1 /_z 1 


XP_ 


.002300505.2 


Populus trichocarpa 


Dt^l— li 17 OO 

rTbHL 1 /_zz 


XP_ 


.002300634.2 


Populus trichocarpa 


Dt^l— II 17 OO 

rTbHL 1 /_zo 


XP_ 


.002299750.2 


Populus tnchocarpa 


Dt^l— II 17 1 A 

rtbHL 1 /_z4 


XP_ 


.002312820.1 


Populus tnchocarpa 


Dt^l— II 17 OC 

rTbHL 1 /_zo 


XP_ 


.002325214.2 


Populus tnchocarpa 


Dt^l— II 17 OR 

rTbHL 1 /_zb 


XP_ 


.002328249.1 


Populus trichocarpa 


Dtr^UII 17 07 

rTb H L 1 /_z / 


XP_ 


.002321273.1 


Populus trichocarpa 


Dtr^UII 17 OQ 

rTbHL 1 /_Zo 


XP_ 


.006386924 


Populus trichocarpa 


Dt^UII 17 OQ 

rTbHL 1 /_zy 


XP_ 


.002329975.1 


Populus trichocarpa 


PtGHL17_30 


XP_ 


.002321266.1 


Populus trichocarpa 


PtGHL17_31 


XP_ 


.002329954.1 


Populus trichocarpa 


PtGHL17_32 


XP_ 


.002315222.2 


Populus trichocarpa 


PtGHL17_33 


XP_ 


.002332466.1 


Populus trichocarpa 


PtGHL17_34 


XP_ 


.002329964.1 


Populus trichocarpa 


PtGHL17_35 


XP_ 


.0023324671 


Populus trichocarpa 


PtGHL17_36 


XP_ 


.0023241271 


(Continued) 
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Table 1 | Continued 



Organism Identifier in this Sequence 

paper identifier 



Populus trichocarpa 


PtGHL17. 


.3? 


XP. 


.002329956.1 


Populus trichocarpa 


PtGHL17. 


.38 


XP. 


.002302261.1 


Populus trichocarpa 


PtGHLI?. 


.39 


XP. 


.0023139?0.1 


Populus trichocarpa 


PtGHL17. 


.40 


XP_ 


.002319699.1 


Populus trichocarpa 


PtGHLI?. 


.41 


XP. 


.0063?2260.1 


Populus trichocarpa 


PtGHLI?. 


.42 


XP. 


.002330836.1 


Populus trichocarpa 


PtGHLI?. 


.43 


XP_ 


.002308921.2 


Populus trichocarpa 


PtGHLI?. 


.44 


XP. 


.002306606.2 


Populus trichocarpa 


PtGHLI?. 


.45 


XP. 


.002299?91.2 


Populus trichocarpa 


PtGHLI?. 


.46 


XP. 


.002309443.2 


Populus trichocarpa 


PtGHLI?. 


.4? 


XP. 


.002310612.1 


Populus trichocarpa 


PtGHLI?. 


.48 


XP. 


.002323325.2 


Populus trichocarpa 


PtGHLI?. 


.49 


XP. 


.002314934.2 


Populus trichocarpa 


PtGHLI?. 


.50 


XP. 


.00231 5??5.2 


Populus trichocarpa 


PtGHLI?. 


.51 


XP. 


.002308018.2 


Populus trichocarpa 


PtGHLI?. 


.52 


XP. 


.002314086.1 


Populus trichocarpa 


PtGHLI?. 


.53 


XP. 


.00232496? 


Populus trichocarpa 


PtGHLI?. 


.54 


XP. 


.0023051?4.1 



The table includes the source organism, abbreviation used in this study and 
sequence identifier en NCBI. 

'This ORF was obtained by translating the sequence resulting from overlapping 
these fwo ESTs. 



(alignments provided in Supplementary data 2). In this case 
convergence was reached after 45000 generations. 

A graphical representation of the GH17 domain alignment was 
performed using weblogo3 (Crooks et al, 2004). In the logo the 
overall height of the stack indicates the sequence conservation at 
that position. 

GENERATION OF TRANSGENIC PLANT MATERIAL 

Construction of p35S-mCitrine-PdBGl (At3gl3560) was 
described elsewhere (Benitez-Alfonso et al, 2013). N-terminal 
and GPI-anchor domains were predicted for At4g31140 and 
At5g58090 using SignalP 4.1 Serve and GPI-SOM (Fankliauser 
and Maser, 2005; Petersen et al, 2011). mCitrine protein 
fusions were obtained by overlapping PGR (Tian et al., 2004) 
and expressed in the binary vector pB7WG2.0 using Gateway 
technology. The mCitrine was fused in frame between amino 
acids 454-455 in the case of At4g31140 and between amino acids 
445-446 in the case of At5g58090. 

Transient expression was verified by agroinfiltration in 
Nicotiana benthamiana leaves. Stable transgenic lines were gen- 
erated using the floral dip method, followed by selection with 
BASTA. T2 seeds were sterilized and germinated in long day 
conditions on plates containing MS medium supplemented with 
BASTA (25 |ig/ml). 

CALLOSE STAINING 

Callose deposition at PD was detected in plant samples vacuum 
infiltrated with 0,1% (w/v) aniline blue in 0,1M sodium 
phosphate (pH 9.0) and incubated in the dark for 1-2 h before 
imaging. 



Table 2 | Classification of embryophyte sequences based on protein 
structure and phylogenetic distribution. 



Sequence identifier 


Type 


Branch 


PnfiHI 1? 1 


1 




PnfiHI 1? 7 


T 




PnfiHI 1? ? 

1 IJvJ 111—1/ O 


1 




PnfiHI 1? 4 


0 




PnfiHI 1? 5 
r [jvj 1 1 1_ 1 / vj 


-] 




PnfiHI 1? fi 

1 [JV3 1 1 1_ 1 / VJ 


T 




PnfiHI 1? ? 


2 




PnfiHI 1? R 
r 1 1 1_ 1 / o 


0 




PnfiHI 1? 9 


0 




PnfiHI 1? in 


2 


ft 
p 


PnfiHI 1? 11 


T 




Pn^HI 17 19 
r[JUnL 1 / _ IZ 


2 


a 
P 


Pn^HI 17 1"^ 
r[JUnL 1 / _ \0 


2 


a 
P 


Pnt^HI 17 1A 


1 
1 


P 


PnriHI 17 

r [ 1 1_ 1 / 1 vJ 


■] 


ft 
P 


PnriHI 17 1fi 
r [ 1 1_ 1 / 1 u 


0 


ft 
P 


PnHHI 17 17 
r [JVJ n L 1 / _ 1 / 


n 
u 




PnHHI 17 1R 
r[jVjnL 1 / _ lO 


n 
u 




n^r;!-!! 17 1 
wbo n L 1 /_ 1 


0 


CL 


n<;r;i-ii 17 9 


Q 
0 




n^nm 17 ? 


3 




Oc^ni-ll 17 A 
n i_ 1 / H 


0 




Ocnui 17 R 


0 




Ocnui 17 R 
w b u n i_ 1 / 0 


1 




Dc^ni-ll 17 7 


Q 
0 




n<;r;i-ii 17 r 

wboni- 1 / 0 


9 

z 




n<;r;i-ii 17 q 


9 

z 




r)c:P,\-\\ 17 10 


2 




O^riHI 17 11 

V_/OVJ 1 11—1/ 1 1 


2 


ft 
P 


Ot^ni-ll 17 19 

Wb^JFlL 1 / IZ 


9 

z 


ft 
P 


Ocni-ii 17 1'^ 


9 

z 


ft 
P 


n<;(^l-l! 17 1A 


9 

z 


ft 
P 


Oc^r^l-li 17 1 R 


2 


ft 
P 


Oc^ni-ll 17 Ifi 


2 


ft 
P 


Oc^ni-ll 17 17 
wbo n i_ 1 / 1 / 


2 


ft 
P 


OsGHLl? 18 


2 


ft 
P 


OsGHLl? 19 


2 


ft 
P 


OsGHLl? 20 


2 


ft 
P 


OsGHi 17 ?1 


2 


ft 
P 


OsGHLl? 22 


T 




v_-Vo'Ji 1 1_ 1 / z_ 0 


3 




OsfiHI 17 24 


2 


ft 
P 


At2g05790 






At4g26830 




a 


At5g55180 




a 


At4g 18340 




Cl 


Atlg30080 




a 


At2g26600 


3 


a 


At3g 15800 


3 


a 


At2g27500 


1 


a 


At5g42100 


3 


a 



(Continued) 
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Table 2 | Continued 






Table 2 | Continued 






Sequence identifier 


Type 


Branch 


Sequence identifier 


Type 


Brancli 


At1g32860 


3 


a 


PtGHL17. 


_12 


1 


a 


At5g24318 


1 


a 


PtGHL17_ 


J 3 


1 


a 


At3g46570 


1 


a 


PtGHL17. 


_14 


1 


a 


At2g39640 


1 




PtGHLI?. 


J 5 


1 


a 


At3g55430 


1 




PtGHL17. 


J 6 


1 


a 


At5g42720 


3 




PtGHLI?. 


J7 


1 


a 


At4g34480 


1 


a 


PtGHLI?. 


_18 


3 


a 


At2g 16230 


1 


a 


PtGHL17_ 


_19 


1 


a 


At3g13560 


2 


a 


PtGHLI /_ 


_20 


3 


a 






PtGHLI /_ 


_21 


3 


a 


At1g11820 


1 


a 














D+(^UI -1"7 

PtbHL 1 /_ 


_22 


1 


a 


At1g66250 


2 


a 






rturlL 1 /. 




3 


a 


At2g01630 


2 


a 












rlu nL 1 /_ 




1 


a 


At4g29360 


2 


a 


r nj] riL 1 /_ 




3 


a 


At5g56590 


2 


a 


PtGHL17 


26 


O 


a 


At3g55780 


1 


a 


PtGHL17 


27 


1 
1 


DL 


At3g61810 


1 


a 


PtGHL17 


28 


1 


a 


At3g07320 


1 


a 


PtGHL17 


29 


3 




At3g23770 


1 


a 


PtGHL17 


30 


1 




At4g 14080 




a 


PtGHL17. 


_31 


1 




At5g58480 


2 




PtGHL17_ 


_32 


2 




At4g17180 


1 




PtGHL17. 


_33 


1 




At5g64790 


2 




PtGHL17. 


_34 


2 


a 


At3g04010 


2 


P 


PtGHL17. 


_35 


1 




At5g18220 


2 


P 


PtGHL17. 


_36 


2 




Atl g64760 


2 


P 


PtGHL17. 


_37 


1 


a 


At2g 19440 


2 


p 


PtGHL17. 


_38 


0 


y 


At3g24330 


2 


6 


PtGHL17_ 


_39 


2 




At5g20870 


2 


6 


PtGHL17_ 


_40 


2 




At5g58090 


2 


ft 
P 


PtGHL17. 


_41 


2 




At4g31140 


2 


ft 
P 


PtGHL17. 


_42 


1 




At1g77790 


1 


y 


PtGHL17. 


_43 


1 


Y 


At1g77780 


3 


y 


n*/^ III -1 ~7 

PtGHL17. 


_44 


0 


Y 


At5g20390 


1 


y 


D+/^ LJ 1 17 

PtGHLI /_ 


_45 


1 


Y 


At5g20560 






D+/^ LJ 1 17 

PtGHLI /_ 


_46 


2 




1 


y 










PtGHL17_ 


_47 


2 




At1g33220 


1 


y 


PtGHL17. 


_48 


1 








Y 


At5g20340 


1 


y 


PtGHLI 7. 


_49 


1 


Y 


At5g20330 


1 


y 


PtGHL17. 


_50 


0 


Y 


At4g16260 




y 


PtGHL17. 


_51 


1 


Y 


At3n'i7?7n 




y 


PtGHL17. 


_52 


1 


Y 






y 


PtGHL17. 


_53 


2 








y 


PtGHL17. 


_54 


3 


a 


PtGHL17_1 


2 


a 


















PtGHL17_2 


1 


a 


The table classifies the sequences used in this paper according to the presence 


PtGHL17_3 


2 


a 


of signal peptide, X8 domain and/or GPI anchor as described in Matehals and 


PtGHL17_4 


0 


a 


Methods. It also mentions the branch in the tree where this sequence appears. 


PtGHL17_5 


2 


a 


Consult Table 1 to access 


the sequence corresponding to each identifier in 








NCBI. 








PtGHL17_6 


2 


a 








PtGHL17_7 




a 


MICROSCOPY 






PtGHL17_8 




a 


Confocal analysis was 


performed on a Zeiss LSM700 Inverted 


PtGHL17_9 




a 


microscope usine a 4 


88 nm excitation laser for mCitrine, the 


PtGHL17_10 




a 


405 nm laser for aniline blue fluorochrome and 585 nm 


laser to 


PtGHL17_11 




a 


detect chloroplast autofluorescence. Emission was collected using 



(Continued) the filters: BP 505-530 for mCitrine, the DAPI filter for aniline 
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blue (463 nm) and LP 615 filter for chloroplasts (581 nm). The 
images corresponded to stacks of z- optical sections. Sequential 
scanning was used to image tissues expressing mCitrine and 
stained with aniline blue. 

RESULTS 

IDENTIFICATION OF GHL17 SEQUENCES IN CHAROPHYTES AND 
EMBRYOPHYTES SUGGEST GENE FAMILY EXPANSION 

The presence of intercellular connections (phragmoplast and/or 
less evolved PD) has been described in some species belonging to 
the Charophytes (Figure 1) but so far, in this lineage, regulation 
of PD by callose metabolism has only been demonstrated 
in embryophytes (Scherp et al., 2001; Schuette et al., 2009). 
The presence of {5-1,3 glucans in the cell wall of unicellular 
organisms indicate an ancient origin for this metabolic path- 
way but how and when it evolved to control PD transport 
is unknown (Sorensen et al., 2011). In an attempt to answer 
this question, we isolated sequences encoding GH17 domains 
from charophytes, bryophytes, and vascular plants. Based on 
the availability of sequence information, we selected represen- 
tative species from the charophycean orders: Klebsormidiales 
{Klebsormidium flaccidum), Zignematales {Penium margar- 
itaceum), Coleochatales {Chaetosphaeridium globosum) and 
Charales {Nitella mirabilis). 14 partial transcripts were isolated 
but only 7 (2 from Klebsormidium, 1 from Penium, 1 from C. 
globosum and 3 from Nitella) contained key aminoacids forming 
the active site of GHL17 (Table 1). 

Full-length GHL17 sequences were isolated from moss 
{Physcomitrella patents) and from monocots {Oryza sativa) and 
dicots {Arabidopsis thaliana and Populus trichocarpa) model 
plants using genome information and protein annotation 
databases. In total we were able to identify 18 sequences in 
Physcomitrella, 24 sequences in Oryza sativa, 50 sequences in 
Arabidopsis thaliana and 54 in Populus trichocarpa (Table 1). 
The increasing number of sequences isolated in land plants with 
respect to those isolated in algae and moss suggests that an expan- 
sion in this gene family have occurred during or immediately after 
land colonization. 

We used prediction tools to determine the structure and local- 
ization of the proteins encoded by the sequences identified. This 
was not possible for algae representatives because only partial 
transcripts were isolated. For moss, rice, Arabidopsis and Populus 
sequences, secretory signal peptides (SP) and the presence of 
C-terminal GPI anchoring domains were predicted using sev- 
eral bioinformatics websites (see Material and Methods). GHL17 
sequences were also classified according to the presence of one 
or more carbohydrate binding domains (named X8 or CBM43). 
We classified sequences in 4 types according to the presence of 
one or more of these features (see Material and Methods and 
Table 2). Type 2 and 3 displayed a SP and GPI-anchor signa- 
ture that predicts their localization at the PM or at membra- 
nous subdomains (such as PD). From the 18 sequences isolated 
in Physcomitrella only 4 were classified as type 2. Arabidopsis 
genome contained 2 1 membrane predicted sequences (42% of the 
total), which were experimentally verified in a proteomic anal- 
ysis (Borner et al., 2003). The number of membrane predicted 
GHL17 was very similar in rice and Populus trichocarpa (22 in 



rice, 21 in poplar). When comparing moss and vascular plants 
a major increase in the number of predicted membrane-targeted 
proteins is detected consistent with the hypothesis that GHL17 
evolved and expanded to support or adopt specialized functions 
at membraneous domains in terrestrial environments. 

KEY AMINO ACID RESIDUES IN THE GH17 DOMAIN ARE CONSERVED 
THROUGHOUT EVOLUTION 

Research on GHL17 protein structure revealed two strictly con- 
served glutamate residues that act as the proton donor and 
the nucleophile in all reactions catalyzed by glycosyl hydrolases 
(Jenkins et al., 1995; Wojtkowiak et al., 2013). A number of aro- 
matic and hydrophilic residues located near the catalytic cleft, 
presumably involved in substrate specificity and enzyme activity, 
are also conserved among all plant GHL17 proteins (Wojtkowiak 
etal, 2013). 

To study the molecular evolution of the GH17 domain in green 
algae, moss and plants, we translated and aligned the domain 
region of the retrieved sequences using MEGA5 (Supplementary 
data 1). We also included sequences isolated from fungi repre- 
sentatives to analyze domain conservation in a different lineage. 
The results revealed that the glutamate catalytic residues (E) are 
highly conserved among all charophycean representatives, fungi 
and embryophytes (highlighted in red in the alignment shown 
in Supplementary data 1 and in Figure 2). Similarly, the residues 
surrounding the catalytic site are mostly conserved in all selected 
representatives (Supplementary data 1, Figure 2). Moreover a 
region contained the aromatic residues Tyr200 and Phe203 (loca- 
tion refer to At2g05790 sequence), which is involved in substrate 
interaction (Wojtkowiak et al., 2013), is also conserved in all 
streptophytes (Figure 2). 

The high degree of similarity between the catalytic sites of 
GHL17 proteins in green algae, fungi and land plants supports 
the ancestral origins of this metabolic pathway. 

PHYLOGENY REVEALED A GROUP OF GHL17 PROTEINS THAT 
APPEARED IN EMBRYOPHYTES ONLY 

The phylogenetic distribution of Arabidopsis GHL17 sequences 
has been studied before (Doxey et al., 2007). Based on tree 
topology, these proteins were grouped into three distinct clades: 
a, p, and y. Predicted membrane GHL17 were evenly dis- 
tributed in clade a and p. We investigated the evolutionary 
origin of these clades by comparing the phylogenetic distribu- 
tion of GHL17 sequences isolated from charophycean green algae, 
fungi Physcomitrella patens, Oryza sativa and Arabidopsis thaliana. 
Although plants and fungi evolved in a different lineage, they 
share a common eukaryotic origin, which is reflected in the con- 
servation of key aminoacids in the GH17 domain (Supplementary 
data 1). 

Unrooted phylogenetic trees were generated using three search 
algorithms: Bayesian inference (Bayesian), Maximum Likelihood 
(ML) and Neighbor Joining (NJ) (Figure 3A and supplementary 
data 3). The tree topology was generally well supported by all 
3 methods, with the exception of several higher order branches 
in ML and NJ bootstrap values. The three phylogenetic clades 
(a, p, and y) described by Doxey et al. (2007) are color coded 
in Figure 3A. Fungi selected sequences branch off at the same 
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I ylk* k Jliiili ^Mim rim^ 






fey If 



T 



y 



NtGHL17 1 ESARSY V - ANVAAYLPRVKI GS 

PpGHL17 1 AAAQSWVQSNI AAHMPATQVTA 

OsGHL17 1 AAAQAWVQQHVRPYLPSARITC 

AT2G05790 SFAVSWVKRNVAAYHPSTQI ES 

PtGHL17 1 SKAAAWI NQNVAAYLPSTSITA 



I T V G N E A L 

LA V GNEVF 

I T V GNEVF 

I A V GNEVF 

I A V G S E V L 



SVNDGGQYEHTL 
T - TS - PQMSSQL 
KGND - TAL KANL 
V - DT - HNTTSFL 
T - SI - PNLVTVL 



M R N L Y 
M M N I H 
M Q S V Y 



P A M R N I 
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N A V VALGL 



HKAL MSFNL 
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PpGHL17 1 
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AT2G05790 
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P P 

P P 

P S 

P P 
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S H L 

N P I 
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5 F Y 
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MI NFYPYFA 
LI NCYPYFA 
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N V 1 
S V I 



E GNSDVI PL 



MLNAYPYFG Y T SGNG 



D Y A L 
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E Y V L 

D Y A L 
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FIGURE 2 I Sequence conservation in the domain region of GI-IL17 
proteins. The top panels show the consensus region for GH17 using 
weblogo. This was obtained by aligning all the sequences isolated from green 
algae and embryophytes (consult Table 1 to obtain the NCBI identifier for 
these sequences). The bottom panel shov\/s an alignment of representative 
domain sequences from Nitella mirabilis (NtGHL17_l), from moss 
(PpGHL17_l) and from the vascular plants Arabidopsis thaliana (At2G05790), 



Oryza sativa (OsGHL17_l) and Populus trichocarpa (PtGHL17_l). Conserved 
aminoacids are highlighted in yellow in the alignment. The position of the 
glutamate residues (E) actively involved in the catalytic reaction is indicated 
with arrows in the weblogo and in red in the alignment. Notice conserved 
domains around the catalytic sites. Tyr (Y) and Phe (F) residues conserved in 
plants and presumably important in substrate binding are indicated in green 
in the bottom panel. 



point as some algae representatives and near the point of con- 
nection of plant sequences forming the clade beta. This suggests a 
more ancestral origin for this clade (Figure 3B). Clade alpha and 
gamma contained embryophytes only and, for the purpose of this 
paper, they could be considered as a single clade (Figure 3C). 

Only partial transcripts were isolated for algae representatives 
hence gaps were introduced in the alignments that could affect 
the accuracy and reliability of the trees. To confirm the tree topol- 
ogy, we manually eliminate these gaps to generate trees containing 
the sequence region encoding the domain only (marked in yel- 
low in Supplementary data 1 ). As shown in supplementary data 3, 
the distribution of sequences in the different clades and the rela- 
tionship between the different branches was conserved in these 
"domain only" trees. 

As in Arabidopsis, even distribution of predicted membrane 
sequences between the alpha and the beta clade was observed 
in rice (Figures 3B,C). Interestingly, type 3 proteins were almost 
exclusively found in the alpha clade. In summary our phyloge- 
netic analysis suggest that GHL17 membrane proteins contained 
in clade alpha appeared in early embryophytes presumably to 
adopt new functions at the cell periphery. 

PD LOCALIZED GHL17 PROTEINS ARE CONTAINED IN THE o; CLADE 

Since cell wall composition and PD complexity evolved during 
land plant colonization, it seems logical to assume that callose, 



and specialized callose metabolic enzymes, were adopted at some 
stage during this evolutionary process to regulate PD aperture. 
The presence of charophytic sequences and the proximity to a 
fungi branch suggests a more ancestral origin for membrane pro- 
teins included in the beta clade (Figure 3B). We hypothesize that 
PD-targeted GHL17 proteins evolved with the appearance of early 
embryophytes, hence likely be contained within the alpha clade 
(Figure 3C). 

The Bayesian tree shows (with high support values) 10 pre- 
dicted membrane proteins (type 2 and 3) from Arabidopsis con- 
tained in the alpha clade whereas 10 type 2 sequences appeared 
in a compact clade within the beta subgroup surrounded by 
sequences isolated from green algae (Figures 3B,C). Data from 
several publications reported the intracellular localization of sev- 
eral GHL17 proteins in Arabidopsis. The root developmental reg- 
ulators At3gl356G, At2g01630, and Atlg66250 (Benitez-Alfonso 
et al., 2013) and the virus-induced protein At5g42100 (Levy et al., 
2007) were PD-localized whereas At3g57260 was preferentially 
expressed in the apoplast (Zavaliev et al., 2013). Confirming our 
hypothesis, all PD localized proteins were grouped in the alpha 
clade (Figure 3C). 

The localization of few GHL17 proteins from Populus has been 
recently reported (Pechanova et al., 2010; Rinne et al., 2011). To 
test the relationship between the appearance of the alpha clade 
and protein localization, we constructed a Bayesian tree with 
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FIGURE 3 I Bayesian phylogenetic consensus tree of GHL17 sequences 
isolated from fungi, green algae and embryophytes representatives (A). 

All sequences are cited in Table 1 and alignment provided in Supplementary 
data 1. Bayesian posterior probabilities are indicated in the branches. Clades a 
(in green), p (in yellow), and y (in red), as defined for Arabidopsis in Doxey 



at al. (2007), are indicated. Fungi sequences form a separate group 
consistent with a different evolutionary lineage. (B) shows a close-up of clade 
p and (C) shows a portion of the a clade. Algae sequences are arrowed in (B) 
and membrane predicted proteins, type 2 and 3, are marked in red circles 
and red triangles respectively. 
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GHL17 sequences isolated from Arabidopsis and from Populus 
trichocarpa. BLAST searches against the Populus genome identi- 
fied a total of 54 non-redundant sequences containing the GH17 
domain (Table 1). Classification of these sequences according 
to bioinformatic predictions identified 21 putative membrane 
proteins (Table 2). A multiple sequence alignment was con- 
ducted and unrooted phylogenetic trees were generated using the 
Bayesian, ML and NJ algorithms (Figure 4 and Supplementary 
data 2 and 4). According to tree topology, Populus GHL17 pro- 
teins also appeared grouped in 3 clades a, P, and y, each well 
supported by high probability values in each tree (Figure 4 and 
Supplementary data 4). As before, type 3 proteins were con- 
tained within the a clade whereas type 2 proteins were distributed 
between the a and p clades. 

Orthologs of PtGHL17_18 and PtGHL17_26 were both 
found to target PD whereas PtGHL17_48 and PtGHL17_49 
orthologs were mainly localized at the PM and lipid bodies 
(Rinne et al., 2011). As expected, PtGHL17_18 and PtGHL17_26 
are membrane predicted proteins contained in the alpha clade 
(Figure 4). The results confirmed a potential link between 
the phylogenetic distribution of GHL17 proteins and their 
intracellular localization. 

USING PHYLOGENETIC DISTRIBUTION TO DISCRIMINATE BETWEEN 
CANDIDATES FOR PD LOCALIZATION 

To identify novel PD components the proteomic composition 
of PD-enriched cell walls has been analyzed (Bayer et al., 2006; 
Fernandez-Calvino et al., 2011). Several GHL17 proteins were 
isolated through these screens, including the predicted mem- 
brane localized proteins At3gl3560, At5g42100, At4g31140, and 
At5g58090. Different from At3gl3560 and At5g42100 (included 
in the alpha clade), At4g31140 and At5g58090 were found in 
clade beta. Successful separation of PD membranous section from 
the desmotubule and the PM is quite challenging (if not impos- 
sible) therefore a number of false positives was expected. The 
results presented above suggest that proteins excluded from the 
alpha clade are not likely targeted to PD sites. Therefore, we 
tested the intracellular localization of At4g31140 and At5g58090 
using as control At3gl3560-mCitrine (a previously PD-localized 
GHL17 protein). m-Citrine fluorescent fusions were obtained and 
expressed transiently in tobacco leaves. The results are shown 
in Figures. Transient expression of either At4g31140-mCit or 
At5g58090-mCit led to protein accumulation in the apoplast 
(Figures 5A-C). At5g58090-mCit also appears to be associated 
with the endoplasmic reticulum (data not shown). 

Transient assays can be misleading. Therefore we obtained 
stable transgenic lines expressing p35s-At5g58090-mCit to con- 
firm the subcellular localization of this protein. Leaves isolated 
from 10 days-old seedlings expressing p35s-At5g58090-mCit and 
leaves isolated from seedlings overexpressing At3gl3560-mCit 
(grown in the same plate) were stained with aniline blue to 
reveal callose deposits at PD sites. The intracellular localization 
of these proteins in stable lines reproduced the results obtained 
in transient assays (Figures 5D,E): At5g58090-mCit was found at 
the cell periphery and in the apoplast whereas At3gl3560-mCit 
was found in a punctuated pattern along the cell wall (presum- 
ably PD sites). Co-localization with callose deposits at PD was 



found for At3gl3560 but not for At5g58090 (white arrows in 
Figures 5D,E). This result suggests that PD localization of GHL17 
proteins could be related to their evolutionary origin, hence with 
the appearance of the alpha clade. 

DISCUSSION 

GHL17 proteins play many different roles in plant development 
and response to biotic and abiotic stresses (Doxey et al., 2007). 
Functional specialization can be predicted by studying protein 
sequence, gene expression and phylogeny (Doxey et al., 2007). 
Here, we used phylogenetic tree reconstruction to study when in 
land plant evolution GHL17 membrane proteins diversify to play 
a role at PD. First, we identified sequences encoding for a GH17 
domain in representatives of green algae, fungi, bryophytes and 
vascular plants. Fungi, as plants, deposit callose at the cell wall 
but don't form plasmodesmata connections. Therefore they are 
an ideal organism to analyze the evolution of 1,3 beta glucanases 
in a different lineage. 

Study of the protein sequences isolated suggests that the 
key amino acids involved in GH17 catalytic activity are highly 
conserved throughout evolution. This is in agreement with other 
reports that demonstrate the presence of beta 1,3 glucans in 
the cell wall of ancient unicellular algae where it is required 
for cell division and cell wall biogenesis (Scherp et al, 2001; 
Sorensen et al., 2011). Specialization of GHL17 proteins to play 
specific roles in the control of PD transport is therefore likely 
a consequence of evolutionary functional diversification within 
this family. 

Classification of embryophytes GHL17 proteins according to 
the presence or absence of a signal peptide, of a GPI-anchored 
domain and of one or more carbohydrate binding domain (X8) 
predicted PM or PD localization for a set of proteins. The num- 
ber of membrane predicted proteins increased from 4 identified 
in moss to 21-22 identified in vascular plants suggesting that 
an expansion occur in this protein family during land plant 
evolution. This might have been necessary to support the adapta- 
tion of multicellular organism to terrestrial environments, which 
might require specialized GHL17 proteins to assume divergent or 
redundant functions at the PM or membraneous subdomains. 

Using phylogenetic analysis we found that membrane-targeted 
sequences are evenly distributed in two major clades (Figure 3). 
Clade alpha contained GHL17 sequences that appeared in 
embryophytes only whereas the beta clade comprised land plants 
and algae proteins and is closely related to a branch con- 
taining fungi sequences. This result suggest that clade alpha 
evolved early during land colonization in the Streptophyte lin- 
eage, whereas clade beta is form by proteins of a more ancestral 
origin (Figures 3B,C). Ultrastructural studies revealed the accu- 
mulation of caUose at PD sites in early embryophytes (Scherp 
et al, 2001; Schuette et al, 2009) therefore GHL17 proteins par- 
ticipating in the regulation of callose at PD sites will likely appear 
in clade alpha. 

Indeed, we noticed that all Arabidopsis PD-located GHL17 
proteins (identified up to date) are clustered in the alpha clade. 
This established an interesting link between the phylogenetic dis- 
tribution of GHL17 proteins and their intracellular localization. 
This correlation was confirmed in Populus: membrane proteins 



Frontiers in Plant Science | Plant Cell Biology 



May 2014 | Volume 5 | Article 212 | 10 



Gaudioso-Pedraza and Benitez-Alfonso 



Origin and evolution of PD-located GHL17 




FIGURE 4 I Majority consensus tree generated by Bayesian inference of 
pfiylogeny of GHL17 proteins isolated from A. thaliana (At) and R 
trichocarpa (Pt) (sequences cited in Table 1). Bayesian posterior 
probabilities are indicated in the tree branches. In accordance with the 
phylogenetic tree presented in Figure 3, branches forming clades a (green), p 



(yellow) and y (black) have been indicated. Type 2 and 3 proteins 
(GPI-anchored proteins) are indicated with red circles and red triangles 
respectively. The position of PtGHL17_18 and PtGHL17_26 (reported to 
localize at PD by Rinne et al., 2011), as well as the position of PD-localized 
Arabidopsis proteins has been indicated with arrows. 



belonging to the alpha clade were reported to localize at PD 
but this was not the case for proteins contained in other clades 
(Rinne et al., 2011). We tested the use of this knowledge for the 
discrimination of false positives isolated in a proteomic screen of 



Arabidopsis PD. Two proteins from the beta clade were identi- 
fied in the PD proteome but intracellular localization of mCitrine 
protein fusions revealed that they accumulate in the apoplast 
(Figure 5). Our results suggest that phylogenetic analysis could 
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FIGURE 5 I Intracellular localization of GHL17 protein m-Citrine 
fusions. (A,B,C) Show At4g31140-mCit, At5g58090-mCitnn, and 
At3gl3560-mCit transient expression in tobacco leaves. Chloroplast 
auto-fluorescence appears in red. (D,E) Show At5g58090-mCit and 
At3gl3560-mCit fluorescence (green) in Arabidopsis leaves expressing 



the fusion proteins under the 35S promoter. Aniline blue staining of 
callose deposits (blue) and the green and blue channels superimposed 
are also shown. Notice that At3gl3560 expression, but not At5g58090, 
co-localizes with callose deposits at PD (white arrows). Scale 
bars = 20|im. 



be potentially a useful tool for the preliminary detection of false 
positive when screening for PD-localized GHL17 proteins. 

To summarize, the results obtained so far suggest that, dur- 
ing (or immediately after) colonization of terrestrial habitats by 
streptophytes, GHL17 gene family evolved and expanded to play 
specialized roles at the cell membrane, including PD regulation. 
Completion of genome sequence and further studies on callose 
regulation in ancestral charophyceans wiU be essential to con- 
firm or refute this theory. Study of phylogenetic relationships 
between ancestral PM targeted GHL17 and those that evolved 
with embryophytes was used here to discriminate between PD- 
localized and non PD-localized proteins in Arabidopsis and 
Populus. This knowledge could theoretically be applied to the 
preliminary screening of GHL17 proteins (aiming to identified 
those that serve specialized roles are PD sites) in other land plant 
representatives. 
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